On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

نویسندگان

  • Chris H. Q. Ding
  • Tao Li
  • Wei Peng
چکیده

Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF (with the I-divergence objective function) optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hybrid method that runs PLSI and NMF alternatively, each jumping out of local minima of the other method successively, thus achieving a better final solution. Extensive experiments on five real-life datasets show relations between NMF and PLSI, and indicate the hybrid method leads to significant improvements over NMFonly or PLSI-only methods. We also show that at first order approximation, NMF is identical to χ-statistic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

NMF and PLSI are two state-of-the-art unsupervised learning models in data mining, and both are widely used in many applications. References have shown the equivalence between NMF and PLSI under some conditions. However, a new issue arises here: why can they result in different solutions since they are equivalent? or in other words, their algorithm differences are not studied intensively yet. I...

متن کامل

Topic Modeling via Nonnegative Matrix Factorization on Probability Simplex

One important goal of document modeling is to extract a set of informative topics from a text corpus and produce a reduced representation of each document. In this paper, we propose a novel algorithm for this task based on nonnegative matrix factorization on a probability simplex. We further extend our algorithm by removing global and generic information to produce more diverse and specific top...

متن کامل

NMF-based Models for Tumor Clustering: A Systematic Comparison

Nonnegative Matrix Factorization (NMF) is one of the famous unsupervised learning models. In this paper, we give a short survey on NMF-related models, including K-means, Probabilistic Latent Semantic Indexing etc. and present a new Posterior Probabilistic Clustering model, and compare their numerical experimental results on five real microarray data. The results show that i) NMF using with K-L ...

متن کامل

A Tutorial on Probabilistic Latent Semantic Analysis

Historically, many believe that these three papers [7, 8, 9] established the techniques of Probabilistic Latent Semantic Analysis or PLSA for short. However, there also exists one variant of the model in [11] and indeed all these models were originally discussed in an earlier technical report [10]. In [2], the authors extended MLE-style estimation of PLSA to MAP-style estimations. A hierarchica...

متن کامل

Learning with matrix factorizations

Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent years (Latent Sema...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2008